41 research outputs found
Deep Convolutional Ranking for Multilabel Image Annotation
Multilabel image annotation is one of the most important challenges in
computer vision with many real-world applications. While existing work usually
use conventional visual features for multilabel annotation, features based on
Deep Neural Networks have shown potential to significantly boost performance.
In this work, we propose to leverage the advantage of such features and analyze
key components that lead to better performances. Specifically, we show that a
significant performance gain could be obtained by combining convolutional
architectures with approximate top- ranking objectives, as thye naturally
fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset
outperforms the conventional visual features by about 10%, obtaining the best
reported performance in the literature
Large-scale image retrieval using similarity preserving binary codes
Image retrieval is a fundamental problem in computer vision, and has many applications. When the dataset size gets very large, retrieving images in Internet image collections becomes very challenging. The challenges come from storage, computation speed, and similarity representation. My thesis addresses learning compact similarity preserving binary codes, which represent each image by a short binary string, for fast retrieval in large image databases. I will first present an approach called Iterative Quantization to convert high-dimensional vectors to compact binary codes, which works by learning a rotation to minimize the quantization error of mapping data to the vertices of a binary Hamming cube. This approach achieves state-of-the-art accuracy for preserving neighbors in the original feature space, as well as state-of-the-art semantic precision. Second, I will extend this approach to two different scenarios in large-scale recognition and retrieval problems. The first extension is aimed at high-dimensional histogram data, such as bag-of-words features or text documents. Such vectors are typically sparse and nonnegative. I develop an algorithm that explores the special structure of such data by mapping feature vectors to binary vertices in the positive orthant, which gives improved performance. The second extension is for Fisher Vectors, which are dense descriptors having tens of thousands to millions of dimensions. I develop a novel method for converting such descriptors to compact similarity-preserving binary codes that exploits their natural matrix structure to reduce their dimensionality using compact bilinear projections instead of a single large projection matrix. This method achieves retrieval and classification accuracy comparable to that of the original descriptors and to the state-of-the-art Product Quantization approach while having orders of magnitude faster code generation time and smaller memory footprint. Finally, I present two applications of using Internet images and tags/labels to learn binary codes with label supervision, and show improved retrieval accuracy on several large Internet image datasets. First, I will present an application that performs cross-modal retrieval in the Hamming space. Then I will present an application on using supervised binary classeme representations for large-scale image retrieval.Doctor of Philosoph
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Multi-label image classification is a fundamental but challenging task
towards general visual understanding. Existing methods found the region-level
cues (e.g., features from RoIs) can facilitate multi-label classification.
Nevertheless, such methods usually require laborious object-level annotations
(i.e., object labels and bounding boxes) for effective learning of the
object-level visual features. In this paper, we propose a novel and efficient
deep framework to boost multi-label classification by distilling knowledge from
weakly-supervised detection task without bounding box annotations.
Specifically, given the image-level annotations, (1) we first develop a
weakly-supervised detection (WSD) model, and then (2) construct an end-to-end
multi-label image classification framework augmented by a knowledge
distillation module that guides the classification model by the WSD model
according to the class-level predictions for the whole image and the
object-level visual features for object RoIs. The WSD model is the teacher
model and the classification model is the student model. After this cross-task
knowledge distillation, the performance of the classification model is
significantly improved and the efficiency is maintained since the WSD model can
be safely discarded in the test phase. Extensive experiments on two large-scale
datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior
performances over the state-of-the-art methods on both performance and
efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table
Experimental exploration of five-qubit quantum error correcting code with superconducting qubits
Quantum error correction is an essential ingredient for universal quantum
computing. Despite tremendous experimental efforts in the study of quantum
error correction, to date, there has been no demonstration in the realisation
of universal quantum error correcting code, with the subsequent verification of
all key features including the identification of an arbitrary physical error,
the capability for transversal manipulation of the logical state, and state
decoding. To address this challenge, we experimentally realise the
code, the so-called smallest perfect code that permits
corrections of generic single-qubit errors. In the experiment, having optimised
the encoding circuit, we employ an array of superconducting qubits to realise
the code for several typical logical states including the magic
state, an indispensable resource for realising non-Clifford gates. The encoded
states are prepared with an average fidelity of while with a high
fidelity of in the code space. Then, the arbitrary single-qubit
errors introduced manually are identified by measuring the stabilizers. We
further implement logical Pauli operations with a fidelity of
within the code space. Finally, we realise the decoding circuit and recover the
input state with an overall fidelity of , in total with gates.
Our work demonstrates each key aspect of the code and verifies
the viability of experimental realization of quantum error correcting codes
with superconducting qubits.Comment: 6 pages, 4 figures + Supplementary Material
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale
cross-modal visual localization by matching ground RGB images to a
geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior
works were demonstrated on small datasets and did not lend themselves to
scaling up for large-scale applications. To enable large-scale evaluation, we
introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of
RGB and aerial LIDAR depth images. We propose a novel joint embedding based
method that effectively combines the appearance and semantic cues from both
modalities to handle drastic cross-modal variations. Experiments on the
proposed dataset show that our model achieves a strong result of a median rank
of 5 in matching across a large test set of 50K location pairs collected from a
14km^2 area. This represents a significant advancement over prior works in
performance and scale. We conclude with qualitative results to highlight the
challenging nature of this task and the benefits of the proposed model. Our
work provides a foundation for further research in cross-modal visual
localization.Comment: ACM Multimedia 202
Overall and cause-specific mortality rates among men and women with high exposure to indoor air pollution from the use of smoky and smokeless coal: a cohort study in Xuanwei, China
OBJECTIVES: Never-smoking women in Xuanwei (XW), China, have some of the highest lung cancer rates in the country. This has been attributed to the combustion of smoky coal used for indoor cooking and heating. The aim of this study was to evaluate the spectrum of cause-specific mortality in this unique population, including among those who use smokeless coal, considered 'cleaner' coal in XW, as this has not been well-characterised. DESIGN: Cohort study. SETTING: XW, a rural region of China where residents routinely burn coal for indoor cooking and heating. PARTICIPANTS: Age-adjusted, cause-specific mortality rates between 1976 and 2011 were calculated and compared among lifetime smoky and smokeless coal users in a cohort of 42 420 men and women from XW. Mortality rates for XW women were compared with those for a cohort of predominately never-smoking women in Shanghai. RESULTS: Mortality in smoky coal users was driven by cancer (41%), with lung cancer accounting for 88% of cancer deaths. In contrast, cardiovascular disease (CVD) accounted for 32% of deaths among smokeless coal users, with 7% of deaths from cancer. Total cancer mortality was four times higher among smoky coal users relative to smokeless coal users, particularly for lung cancer (standardised rate ratio (SRR)=17.6). Smokeless coal users had higher mortality rates of CVD (SRR=2.9) and pneumonia (SRR=2.5) compared with smoky coal users. These patterns were similar in men and women, even though XW women rarely smoked cigarettes. Women in XW, regardless of coal type used, had over a threefold higher rate of overall mortality, and most cause-specific outcomes were elevated compared with women in Shanghai. CONCLUSIONS: Cause-specific mortality burden differs in XW based on the lifetime use of different coal types. These observations provide evidence that eliminating all coal use for indoor cooking and heating is an important next step in improving public health particularly in developing countries
Comparing data-dependent and dataindependent embeddings for classification and ranking of internet images
This paper presents a comparative evaluation of feature embeddings for classification and ranking in large-scale Internet image datasets. We follow a popular framework for scalable visual learning, in which the data is first transformed by a nonlinear embedding and then an efficient linear classifier is trained in the resulting space. Our study includes data-dependent embeddings inspired by the semisupervised learning literature, and data-independent ones based on approximating specific kernels (such as the Gaussian kernel for GIST features and the histogram intersection kernel for bags of words). Perhaps surprisingly, we find that data-dependent embeddings, despite being computed from large amounts of unlabeled data, do not have any advantage over data-independent ones in the regime of scarce labeled data. On the other hand, we find that several data-dependent embeddings are competitive with popular data-independent choices for large-scale classification. 1
Iterative quantization: A procrustean approach to learning binary codes
This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other state-of-the-art methods. 1